Method for regulation of plant lignin composition

ABSTRACT

A method is disclosed for the regulation of lignin composition in plant tissue. Plants are transformed with a gene encoding an active F5H gene. The expression of the F5H gene results in increased levels of syringyl monomer providing a lignin composition more easily degraded with chemicals and enzymes.

This invention was made with Government support under Contract No.DE-FG02-94ER20138 awarded by the Department of Energy. The Governmenthas certain rights in this invention.

This Application is the National Stage of International Application No.PCT/US96/20094, filed Dec. 19, 1996, which claims the benefit of U.S.Provisional application Ser. No. 60/013,388, filed Mar. 14, 1996, andU.S. Provisional Application Ser. No. 60/009,119, filed Dec. 22, 1995.

FIELD OF THE INVENTION

The present method relates to the field of molecular biology and theregulation of protein synthesis through the introduction of foreigngenes into plant genomes. More specifically, the method relates to themodification of plant lignin composition in a plant cell by theintroduction of a foreign plant gene encoding an activeferulate-5-hydroxylase (F5H) enzyme. Plant transformants harboring theF5H gene demonstrate increased levels of syringyl monomer residues intheir lignin, a trait that is thought to render the polymer moresusceptible to delignification.

BACKGROUND

Lignin is one of the major products of the general phenylpropanoidpathway, and is one of the most abundant organic molecules in thebiosphere (Crawford, (1981) Lignin Biodegradation and Transformation,New York: John Wiley and Sons). In nature, lignification providesrigidity to wood and is in large part responsible for the structuralintegrity of plant tracheary elements. Lignin is well suited to thesecapacities because of its physical characteristics and its resistance tobiochemical degradation. Unfortunately, this same resistance todegradation has a significant impact on the utilization oflignocellulosic plant material (Whetten et al., Forest Ecol. Management43, 301, (1991)).

The monomeric composition of lignin has significant effects on itschemical degradation during industrial pulping (Chiang et al., Tappi,71, 173, (1988). The guaiacyl lignins (derived from ferulic acid)characteristic of softwoods such as pine, require substantially morealkali and longer incubations during pulping in comparison to theguaiacyl-syringyl lignins (derived from ferulic acid and sinapic acid)found in hardwoods such as oak. The reasons for the differences betweenthese two lignin types has been explored by measuring the degradation ofmodel compounds such as guaiacylglycerol-β-guaiacyl ether,syringylglycerol-β-guaiacyl ether, andsyringylglycerol-β-(4-methylsyringyl) ether (Kondo et al.,Holzforschung, 41, 83, (1987)) under conditions that mimic those used inthe pulping process. In these experiments, the mono- and especiallydi-syringyl compounds were cleaved three to fifteen times faster thantheir corresponding diguaiacyl homologues. These model studies are inagreement with studies comparing the pulping of Douglas fir and sweetgumwood where the major differences in the rate of pulping occurred above150° C. where arylglycerol-β-aryl ether linkages were cleaved (Chiang etal., Holzforschung, 44, 309, (1990)).

Another factor affecting chemical degradation of the two lignin formsmay be the condensation of lignin-derived guaiacyl and syringyl residuesto form diphenylmethane units. The presence of syringyl residues inhardwood lignins leads to the formation of syringyl-containingdiphenylmethane derivatives that remain soluble during pulping, whilethe diphenylmethane units produced during softwood pulping arealkali-insoluble and thus remain associated with the cellulosic products(Chiang et al., Holzforschung, 44, 147, (1990); Chiang et al.,Holzforschung, 44, 309, (1990)). Further, it is thought that theabundance of 5-5'-diaryl crosslinks that can occur between guaiacylresidues contributes to resistance to chemical degradation. This linkageis resistant to alkali cleavage and is much less common in lignin thatis rich in syringyl residues because of the presence of the 5-O-methylgroup in syringyl residues. The incorporation of syringyl residuesresults in what is known as "non-condensed lignin", a material that issignificantly easier to pulp than condensed lignin.

Similarly, lignin composition and content in grasses is a major factorin determining the digestibility of lignocellulosic materials that arefed to livestock (Jung, H. G. & Deetz, D. A. (1993) Cell walllignification and degradability in Forage Cell Wall Structure andDigestibility (H. G. Jung, D. R. Buxton, R. D. Hatfield, and J. Ralpheds.), ASA/CSSA/SSSA Press, Madison, Wis.). The incorporation of thelignin polymer into the plant cell wall prevents microbial enzymes fromhaving access to the cell wall polysaccharides that make up the wall. Asa result, these polysaccharides cannot be degraded and much of thevaluable carbohydrates contained within animal feedstocks pass throughthe animals undigested. Thus, an increase in the dry matter of grassesover the growing season is counteracted by a decrease in digestibilitycaused principally by increased cell wall lignification. From theseexamples, it is clear that the modification of lignin monomercomposition would be economically advantageous.

The problem to be overcome, therefore, is to develop a method for thecreation of plants with increased levels of syringyl residues in theirlignin to facilitate its chemical degradation. Modification of theenzyme pathway responsible for the production of lignin monomersprovides one possible route to solving this problem.

The mechanism(s) by which plants control lignin monomer composition hasbeen the subject of much speculation. As mentioned earlier, gymnospermsdo not synthesize appreciable amounts of syringyl lignin. Inangiosperms, syringyl lignin deposition is developmentally regulated:primary xylem contains guaiacyl lignin, while the lignin of secondaryxylem and sclerenchyma is guaiacyl-syringyl lignin (Venverloo,Holzforschung 25, 18 (1971); Chapple et al., Plant Cell 4, 1413,(1992)). No plants have been found to contain purely syringyl lignin. Itis still not clear how this specificity is controlled; however, at leastfive possible enzymatic control sites exist, namely caffeicacid/5-hydroxyferulic acid O-methyltransferase (OMT), F5H,(hydroxy)cinnamoyl-CoA ligase (4CL), (hydroxy)cinnamoyl-CoA reductase(CCR), and (hydroxy)cinnamoyl alcohol dehydrogenase (CAD). For example,the substrate specificities of OMT (Shimada et al., Phytochemistry, 22,2657, (1972); Shimada et al., Phytochemistry, 12, 2873, (1973); Gowri etal., Plant Physiol, 97, 7, (1991); Bugos et al., Plant Mol. Biol. 17,1203, (1992)) and CAD (Sarni et al., Eur. J Biochem., 139, 259, (1984);Goffner et al., Planta., 188, 48, (1992); O'Malley et al., PlantPhysiol., 98, 1364, (1992)) are correlated with the differences inlignin monomer composition seen in gymnosperms and angiosperms, and theexpression of 4CL isozymes (Grand et al., Physiol. Veg. 17, 433, (1979);Grand et al., Planta., 158, 225, (1983)) has been suggested to berelated to the tissue specificity of lignin monomer composition seen inangiosperms.

Although there are at least five possible enzyme targets that could beexploited, only OMT and CAD have been investigated in recent attempts tomanipulate lignin monomer composition in transgenic plants (Dwivedi etal., Plant Mol. Biol. 26, 61, (1994); Halpin et al., Plant J 6, 339,(1994); Ni et al., Transgen. Res. 3, 120 (1994); Atanassova et al.,Plant J. 8, 465, (1995); Doorsselaere et al., Plant J 8, 855, (1995)).Most of these studies have focused on sense and antisense suppression ofOMT expression. This approach has met with variable results, probablyowing to the degree of OMT suppression achieved in the various studies.The most dramatic effects were seen by using homologous OMT constructsto suppress OMT expression in tobacco (Atanassova et al., supra) andpoplar (Doorsselaere et al., supra). Both of these studies found that asa result of transgene expression, there was a decrease in the content ofsyringyl lignin and a concomitant appearance of 5-hydroxyguaiacylresidues. As a result of these studies, Doorsselaere et al., (WO9305160) disclose a method for the regulation of lignin biosynthesisthrough the genomic incorporation of an OMT gene in either the sense oranti-sense orientation. In contrast, Dixon et al. (WO 9423044)demonstrate the reduction of lignin content in plants transformed withan OMT gene, rather than a change in lignin monomer composition. Similarresearch has focused on the suppression of CAD expression. Theconversion of coniferaldehyde and sinapaldehyde to their correspondingalcohols in transgenic tobacco plants has been modified with theincorporation of an A. cordata CAD gene in anti-sense orientation(Hibino et al., Biosci. Biotechnol. Biochem., 59, 929, (1995)). Asimilar effort aimed at antisense inhibition of CAD expression generateda lignin with increased aldehyde content, but only a modest change inlignin monomer composition (Halpin et al., supra). This research hasresulted in the disclosure of methods for the reduction of CAD activityusing sense and anti-sense expression of a cloned CAD gene to effectinhibition of endogenous CAD expression in tobacco [Boudet et al., (U.S.Pat. No. 5,451,514) and Walter et al., (WO 9324638); Bridges et al., (CA2005597)]. None of these strategies increased the syringyl content oflignin, a trait that is correlated with improved digestibility andchemical degradability of lignocellulosic material (Chiang et al.,supra; Chiang and Funaoka, Holzforschung 44, 309 (1990); Jung et al.,supra).

Although F5H is also a key enzyme in the biosynthesis of syringyl ligninmonomers it has not been exploited to date in efforts to engineer ligninquality. In fact, since the time of its discovery over 30 years ago(Higuchi et al., Can. J Biochem. Physiol., 41, 613, (1963)) there hasbeen only one demonstration of the activity of F5H published (Grand, C.,FEBS Lett. 169, 7, (1984)). Grand demonstrated that F5H from poplar wasa cytochrome P450-dependent monooxygenase (P450) as analyzed by theclassical criteria of dependence on NADPH and light-reversibleinhibition by carbon monoxide. Grand further demonstrated that F5H isassociated with the endoplasmic reticulum of the cell. The lack ofattention given to F5H in recent years may be attributed in general tothe difficulties associated with dealing with membrane-bound enzymes,and specifically to the lability of F5H when treated with the detergentsnecessary for solubilization (Grand, supra). The most recent discoverysurrounding the F5H gene has been made by Chapple et al., (supra) whoreported a mutant of Arabidopsis thaliana L. Heynh named fah1 that isdeficient in the accumulation of sinapic acid-derived metabolites,including the guaiacyl-syringyl lignin typical of angiosperms. Thislocus, termed FAH1, encodes F5H. The cloning of the gene encoding F5Hwould provide the opportunity to test the hypothesis that F5H is auseful target for the engineering of lignin monomer composition.

In spite of sparse information about F5H in the published literature,Applicant has been successful in the isolation, cloning, and sequencingof the F5H gene. Applicant has also demonstrated that the stableintegration of the F5H gene into the plant genome, where the expressionof the F5H gene is under the control of a promoter other than the gene'sendogenous promoter, leads to an altered regulation of ligninbiosynthesis.

SUMMARY OF THE INVENTION

The present invention provides isolated nucleic-acid fragmentscomprising the nucleotide sequences which correspond to SEQ ID NO.: 1and SEQ ID NO.: 3 encoding an active plant F5H enzyme wherein the enzymehas the amino acid sequence encoded by the mature functional proteinwhich corresponds to SEQ ID NO.: 2 and wherein the amino acid sequenceencompasses amino acid substitutions, additions and deletions that donot alter the function of the F5H enzyme.

The invention further provides a chimeric gene causing alteredguaiacyl:syringyl lignin monomer ratios in a transformed plant, the genecomprising a nucleic acid fragment encoding an active plant F5H enzymeoperably linked in either sense or antisense orientation to suitableregulatory sequences. The nucleic acid fragments are those describedabove.

Also provided is a method of altering the activity of F5H in a plant bymeans of transforming plant cells in a whole plant with a chimeric genecausing altered guaiacyl:syringyl lignin monomer ratios in a transformedplant cell, wherein the gene is expressed; growing said plants underconditions that permit seed development; and screening the plantsderived from these transformed seeds for those that express an activeF5H gene or fragment thereof.

A method is propvided of altering the activity of F5H enzyme in a plantby (i) transforming a cell, tissue or organ from a suitable host plantwith the chimeric gene desribed above wherein the chimeric gene isexpressed; (ii) selecting transformed cells, cell callus, somaticembryos, or seeds which contain the chimeric gene; (iii) regeneratingwhole plants from the transformed cells, cell callus, somatic embryos,or seeds selected in step (ii); (iv) selecting whole plants regeneratedin step (iii) which have a phenotype characterized by (1) an ability ofthe whole plant to accumulate compounds derived from sinapic acid or (2)an altered syringyl lignin monomer content relative to an untransformedhost plant.

The invention additionally provides a method of altering the compositionof lignin in a plant by means of stably incorporating into the genome ofthe host plant by transformation a chimeric gene causing alteredguaiacyl:syringyl lignin monomer ratios in a transformed plant;expressing the incorporated gene such that F5H is expressed and whereinguaiacyl:syringyl lignin monomer ratios are altered from those ratios ofthe untransformed host plant.

BRIEF DESCRIPTION OF THE FIGURES AN SEQUENCE LISTING

FIG. 1 illustrates the biosynthesis of monomeric lignin precursors viathe general phenylpropanoid pathway.

FIG. 2 is an illustration of the pBIC20-F5H cosmid and the F5Hoverexpression construct (pGA482-35S-F5H) in which the F5H gene isexpressed under the control of the constitutive cauliflower mosaic virus3 5S promoter.

FIG. 3 shows an analysis of sinapic acid-derived secondary metabolitesin wild type, the fah1-2 mutant, and independently-derived transgenicfah1-2 plants carrying the T-DNA derived from the pBIC20-F5H cosmid, orthe pGA482-35S-F5H overexpression construct.

FIG. 4 shows the impact of F5H overexpression by comparing the steadystate levels of F5H mRNA in wild type, the fah1-2 mutant, andindependently-derived transgenic fah1-2 plants carrying the T-DNAderived from the 35S-F5H overexpression construct.

FIG. 5 shows a GC analysis of lignin nitrobenzene oxidation products toillustrate the impact of F5H overexpression on lignin monomercomposition in the wild type, the fah1-2 mutant, and a fah1-2 mutantcarrying the T-DNA derived from the 35S-F5H overexpression construct.

FIG. 6 illustrates a Southern blot analysis comparing hybridization ofthe F5H cDNA to EcoRI digested genomic DNA isolated from wild typeArabidopsis thaliana and a number of fah1 mutants.

FIG. 7 is a Northern blot analysis comparing hybridization of the F5HcDNA to RNA isolated from wild type Arabidopsis thaliana and a number offah1 mutants.

FIG. 8 shows the genomic nucleotide (SEQ ID NO.: 3) and amino acid (SEQID NO.: 2) sequences of the Arabidopsis F5H gene and the F5H enzyme thatit encodes.

Applicant(s) have provided three sequence listings in conformity with 37C.F.R. 1.821-1.825 and Appendices A and B ("Requirements for ApplicationDisclosures Containing Nucleotides and/or Amino Acid Sequences") and inconformity with "Rules for the Standard Representation of Nucleotide andAmino Acid Sequences in Patent Applications" and Annexes I and II to theDecision of the President of the EPO, published in Supplement No 2. toOJ EPO, 12/1992.

The sequence of the Arabidopsis thaliana F5H cDNA is given in SEQ IDNO.: 1 and the sequence of the Arabidopsis thaliana F5H genomic clone isgiven in SEQ ID NO.: 3. The sequence of the F5H protein is given in SEQID NO.: 2.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a gene that encodes F5H, a key enzyme inlignin biosynthesis. The invention further provides a method foraltering the lignin composition in plants by transforming plants withthe F5H gene wherein the gene is expressed and causes an increasedconversion of ferulic acid to sinapic acid thereby increasing thesyringyl content of the lignin polymer.

The effect in plants of lignin compositions containing higher syringylmonomer content is that the lignin is more susceptible to chemicaldelignification. This is of particular use in the paper and pulpindustries where vast amounts of energy and time are consumed in thedelignification process. Woody plants transformed with an active F5Hgene would offer a significant advantage in the delignification processover conventional paper feedstocks. Similarly, modification of thelignin composition in grasses by the insertion and expression of aheterologus F5H gene offers a unique method for increasing thedigestibility of livestock feed. Maximizing the digestibility of grassesin this manner offers great potential economic benefit to the farm andagricultural industries.

Plants to which the Invention may be Applied

The invention provides a gene and a chimeric gene construct useful forthe transformation of plant tissue for the alteration of lignin monomercomposition. Plants suitable in the present invention comprise plantsthat naturally lack syringyl lignin or those that accumulate lignin witha high guaiacyl:syringyl ratio. Plants suitable in the present inventionalso comprise plants whose lignin could be modified using antisensetransformation constructs that reduce the syringyl content of thetransgenic plants' lignin if such an alteration were desirable.

Suitable plants may include but are not limited to alfalfa (Medicagosp.), rice (Oryza sp.), maize (Zea mays), oil seed rape (Brassica sp.),forage grasses, and also tree crops such as eucalyptus (Eucalyptus sp.),pine (Pinus sp.), spruce (Picea sp.) and poplar (Populus sp.), as wellas Arabidopsis sp. and tobacco (Nicotiana sp).

Definitions

As used herein the following terms may be used for interpretation of theclaims and specification.

The term "FAH1" refers to the locus or chromosomal location at which theF5H gene is encoded. The term "FAH1" refers to the wild type allele ofthe gene encoding the F5H gene. The term "fah1" refers to any mutantversion of that gene that leads to an altered level of enzyme activity,syringyl lignin content or sinapate ester content that can be measuredby thin layer chromatography, high performance liquid chromatography, orby in vivo fluorescence.

"Gene" refers to a nucleic acid fragment that expresses a specificprotein, including regulatory sequences preceding (5' non-coding) andfollowing (3' non-coding) the coding region. "Native" gene refers to thegene as found in nature with its own regulatory sequences.

A "chimeric gene" refers to a gene comprising heterogeneous regulatoryand coding sequences.

An "endogenous gene" refers to the native gene normally found in itsnatural location in the genome.

A "foreign gene" or "transgene" refers to a gene not normally found inthe host organism but one that is introduced by gene transfer.

The term "promoter" refers to a DNA sequence in a gene, usually upstream(5') to its coding sequence, which controls the expression of the codingsequence by providing the recognition site for RNA polymerase and otherfactors required for proper transcription. A promoter may also containDNA sequences that are involved in the binding of protein factors whichcontrol the effectiveness of transcription initiation in response tophysiological or developmental conditions.

The term "operably linked" refers to nucleic acid sequences on a singlenucleic acid molecule which are associated so that the function of oneis affected by the other.

As used herein, suitable "regulatory sequences" refer to nucleotidesequences located upstream (5'), within, and/or downstream (3') of acoding sequence, which control the transcription and/or expression ofthe coding sequences in conjunction with the protein biosyntheticapparatus of the cell. These regulatory sequences include promoters,translation leader sequences, transcription termination sequences, andpolyadenylation sequences.

The term "T-DNA" refers to the DNA that is transferred into the plantgenome from a T-DNA plasmid carried by a strain of Agrobacteriumtumefaciens that is used to infect plants for the purposes of planttransformation.

The term "T-DNA plasmid" refers to a plasmid carried by Agrobacteriurntumefaciens that carries an origin of replication, selectable markerssuch as antibiotic resistance, and DNA sequences referred to as rightand left borders that are required for plant transformation. The DNAsequence that is transferred during this process is that which islocated between the right and left T-DNA border sequences present on aT-DNA plasmid. The DNA between these borders can be manipulated in sucha way that any desired sequence can be inserted into the plant genome.

The term "ferulate-5-hydroxylase" or "F5H" will refer to an enzyme inthe plant phenylpropanoid biosynthetic pathway which catalyzes theconversion of ferulate to 5-hydroxyferulate and permits the productionof sinapic acid and its subsequent metabolites, including sinapoylmalateand syringyl lignin.

The terms "encoding" and "coding" refer to the process by which a gene,through the mechanisms of transcription and translation, provides theinformation to a cell from which a series of amino acids can beassembled into a specific amino acid sequence to produce an activeenzyme. It is understood that the process of encoding a specific aminoacid sequence includes DNA sequences that may involve base changes thatdo not cause a change in the encoded amino acid, or which involve basechanges which may alter one or more amino acids, but do not affect thefunctional properties of the protein encoded by the DNA sequence. It istherefore understood that the invention encompasses more than thespecific exemplary sequences. Modifications to the sequence, such asdeletions, insertions, or substitutions in the sequence which producesilent changes that do not substantially affect the functionalproperties of the resulting protein molecule are also contemplated. Forexample, alterations in the gene sequence which reflect the degeneracyof the genetic code, or which result in the production of a chemicallyequivalent amino acid at a given site, are contemplated. Thus, a codonfor the amino acid alanine, a hydrophobic amino acid, may be substitutedby a codon encoding another less hydrophobic residue, such as glycine,or a more hydrophobic residue, such as valine, leucine, or isoleucine.Similarly, changes which result in substitution of one negativelycharged residue for another, such as aspartic acid for glutamic acid, orone positively charged residue for another, such as lysine for arginine,can also be expected to produce a biologically equivalent product.Nucleotide changes which result in alteration of the N-terminal andC-terminal portions of the protein molecule would also not be expectedto alter the activity of the protein. In some cases, it may in fact bedesirable to make mutants of the sequence in order to study the effectof alteration on the biological activity of the protein. Each of theproposed modifications is well within the routine skill in the art, asis determination of retention of biological activity in the encodedproducts. Moreover, the skilled artisan recognizes that sequencesencompassed by this invention are also defined by their ability tohybridize, under stringent conditions (2X SSC, 0.1% SDS, 65° C.), withthe sequences exemplified herein.

The term "expression", as used herein, refers to the production of theprotein product encoded by a gene. "Overexpression" refers to theproduction of a gene product in transgenic organisms that exceeds levelsof production in normal or non-transformed organisms.

"Transformation" refers to the transfer of a foreign gene into thegenome of a host organism and its genetically stable inheritance.Examples of methods of plant transformation includeAgrobacterium-mediated transformation and particle-accelerated or "genegun" transformation technology as described in U.S. Pat. No. 5,204,253.

The term "plasmid rescue" will refer to a technique for circularizingrestriction enzyme-digested plant genomic DNA that carries T-DNAfragments bearing a bacterial origin of replication and antibioticresistance (encoded by the β-lactamase gene of E. coli) such that thiscircularized fragment can be propagated as a plasmid in a bacterial hostcell such as E. coli.

The term "lignin monomer composition" refers to the relative ratios ofguaiacyl monomer and syringyl monomer found in lignified plant tissue.

The Phenylpropanoid Biosynthetic Pathway

The lignin biosynthetic pathway is well researched and the principalpathways are illustrated in FIG. 1. Lignin biosynthesis is initiated bythe conversion of phenylalanine into cinnamate through the action ofphenylalanine ammonia lyase (PAL). The second enzyme of the pathway iscinnamate-4-hydroxylase (C4H), a cytochrome P450-dependent monooxygenase(P450) which is responsible for the conversion of cinnamate top-coumarate. The second hydroxylation of the pathway is catalyzed by arelatively ill-characterized enzyme, p-coumarate-3-hydroxylase (C3H),whose product is caffeic acid. Caffeic acid is subsequently O-methylatedby OMT to form ferulic acid, a direct precursor of lignin. The lasthydroxylation reaction of the general phenylpropanoid pathway iscatalyzed by F5H. The 5-hydroxyferulate produced by F5H is thenO-methylated by OMT, the same enzyme that carries out the O-methylationof caffeic acid. This dual specificity of OMT has been confirmed by thecloning of the OMT gene, and expression of the protein in E. coli (Bugoset al., (1991) supra; Gowri et al., (1991) supra).

The committed steps of lignin biosynthesis are catalyzed by 4CL,(hydroxy)cinnamoyl CoA reductase (CCR) and CAD, which ultimatelygenerate coniferyl alcohol from ferulic acid and sinapoyl alcohol fromsinapic acid. Coniferyl alcohol and sinapoyl alcohol are polymerized byextracellular oxidases to yield guaiacyl lignin and syringyl ligninrespectively, although syringyl lignin is more accurately described as aco-polymer of both monomers.

Although ferulic acid, sinapic acid, and in some cases p-coumaric acidare channeled into lignin biosynthesis, in some plants these compoundsare precursors for other secondary metabolites. In Arabidopsis, sinapicacid serves as a precursor for lignin biosynthesis but it is alsochanneled into the synthesis of soluble sinapic acid esters. In thispathway, sinapic acid is converted to sinapoylglucose which serves as anintermediate in the biosynthesis of sinapoylmalate (FIG. 5). Sinapicacid and its esters are fluorescent and may be use as a marker of plantsdeficient in those enzymes needed to produce sinapic acid (Chapple etal., supra).

Identification of the FAH1 Locus and fah1 Alleles

A series of mutants of Arabidopsis that fail to accumulatesinapoylmalate have been identified and have been collectively termedfah1 mutants. The fluorescent nature of sinapoylmalate permits thefacile identification of sinapic acid esters by thin layerchromatography (TLC) followed by observation under ultraviolet (UV)light). The fluorescence of sinapoylmalate can also be visualized invivo because sinapoylmalate is accumulated in the adaxial leafepidermis. Wild type Arabidopsis exhibits a pale blue fluorescence underUV while fah1 mutants appear dark red because of the lack of the bluefluorescence of sinapoylmalate and the fluorescence of chlorophyll inthe subtending mesophyll (Chapple et al., supra).

A TLC-based mutant screen of 4,200 ethyl methanesulfonate-mutagenizedArabidopsis plants identified a number of independent mutant lines thataccumulated significantly lower levels of sinapoylmalate. The mutationsin these lines were identified as fah1--1 through fah1-5. The in vivoUV-fluorescence visual screen was used to identify more mutant linescarrying the fah1 mutation. Two of these mutants (fah1-6 and fah1-7)were selected from EMS-mutagenized populations. One mutant line (fah1-8)was selected from among a mutant population generated by fast-neutronbombardment (Nilan, R. A. Nucl. Sci. Abstr., 28(3), 5940 (1973); Kozeret al., Genet. Pol., 26(3),367, (1985)). A final mutant line, (fah1-9)was identified using the same technique from a T-DNA tagged populationof plants. Before further analysis, each mutant line was backcrossed atleast twice to the wild type and homozygous lines were established.

To determine whether the newly isolated mutant lines were defective atthe same locus, that is, within the gene encoding F5H, geneticcomplementation experiments were performed. In these tests, each mutantline was crossed to fah1-2 which is known to be defective in F5H. Ineach case, the newly isolated mutant line was used as the female parentand was fertilized with pollen from a fah1-2 homozygous mutant. Areciprocal cross was also performed using fah1-2 as the female parent,and the new mutant line as the pollen donor. The seeds from thesecrosses were collected several weeks later, and were planted forsubsequent analysis. The progeny were analyzed for sinapoylmalateproduction by TLC, high pressure liquid chromatography and byobservation under UV light. From these crosses, all of the F1 progenyexamined were sinapoylmalate-deficient, indicating that all of themutations identified were allelic.

The fah1-9 line was selected for further study because of the presenceof the T-DNA insertion within the F5H gene. The T-DNA insertion withinthe FAH1 locus facilitated the cloning of the flanking Arabidopsis DNAwhich could then be used to retrieve the wild type F5H gene from cDNAand genomic libraries (Meyer et al., Proc. Natl. Acad. Sci. USA, 93,6869 (1996)).

Cloning of the FAH1 Locus

A fragment of DNA from the FAH1 locus was isolated from the T-DNA taggedfah1-9 mutant using the technique of plasmid rescue (Meyer et al.,supra). The technique of plasmid rescue is common and well known in theart and may be used to isolate specific alleles from T-DNA transformedplants (Behringer, et al., Plant Mol. Biol. Rep., 10, 190,(1992)).Briefly, the vector used to generate the T-DNA tagged population ofArabidopsis carries sequences required for autonomous replication of DNAin bacteria and sequences that confer antibiotic resistance. Once thisDNA is integrated into the plant genome, specific restrictionendonuclease digests can be employed to generate fragments that can becircularized, ligated, and transformed into E. coli. Circularized DNAfrom the T-DNA will generate functional plasmids that confer antibioticresistance to their bacterial hosts such that they can be identified bygrowth on selective media. Those plasmids that are generated from thesequences including the right and left borders will also carry with themthe plant genomic sequences flanking the T-DNA insertion. Plasmidsgenerated from either of the T-DNA borders that carry flanking DNAsequences can be identified by analyzing the products of diagnosticrestriction enzyme digests on agarose gels. The plasmids with flankingsequences can then serve as a starting point for cloning plant sequencesthat share homology to the DNA at the point of T-DNA insertion(Behringer, et al., supra).

Plasmid rescue was conducted using EcoRI-digested DNA prepared fromhomozygous fah1-9 plants. EcoRI-digested genomic DNA was ligated andthen electroporated into competent DH5α E. coli. DNA from rescuedplasmids was further digested with both EcoRI and SaII and the digestswere analyzed by gel electrophoresis to identify plasmids that containedflanking Arabidopsis DNA. A SacII-EcoRI fragment from this rescuedplasmid was used to identify an F5H clone from an Arabidopsis cDNAlibrary (Newman, T. et al., Plant. Physiol. 106, 1241, (1994)).

DNA Sequencing of the F5H cDNA and genomic clones

Sequence analysis of the F5H cDNA and genomic clones was performed onplasmid DNA manually using a United States Biochemical Sequenase Kit v.2.0, on a DuPont Genesis® 2000 sequencer or on an Applied Biosystems373A DNA sequencer, using standard vector-based sequencingoligonucleotides or custom-synthesized oligonucleotides as appropriate.The sequence of the Arabidopsis thaliana F5H cDNA is given in SEQ IDNO.: 1 and the sequence of the Arabidopsis thaliana F5H genomic clone isgiven in SEQ ID NO.: 3.

The F5H cDNA contains a 1560 bp open reading frame that encodes aprotein with a molecular weight of 58,728. The putative ATG initiationcodon is flanked by an A at -3 and a G at +4, in keeping with thenucleotides commonly found flanking the initiator methionine in plantmRNAs (Lutcke et al., EMBO J 6, 43, (1987)). Immediately following theinferred initiator methionine is a 17 amino acid sequence containingnine hydroxy amino acids (FIG. 8). The subsequent fifteen amino acidsequence is rich in hydrophobic amino acids; eleven hydrophobic residuescomprised of phenylalanine, isoleucine, leucine and valine residues.This hydrophobic stretch is immediately followed by an Arg-Arg-Arg-Argputative stop transfer sequence. F5H also shares significant sequenceidentity with other P450s. Most notable is the stretch between Pro-450and Gly-460. This region contains eight residues that comprise theheme-binding domain and are highly conserved among most P450s, oneexception being allene oxide synthase from Linum usitatissimum (Song etal., Proc. Natl. Acad Sci. USA 90, 8519, (1993)). The Pro-450 to Gly-460region contains Cys-458 in F5H, which by analogy is most likely the hemebinding ligand in this enzyme.

Transformation of fah1-2 Arabidopsis and Restoration of SinapoylmalateAccumulation

The identity of the F5H gene was confirmed by complementation of thefah1-2 mutant with a genomic clone and a construct where the F5H genomiccoding sequence was expressed under the control of the cauliflowermosaic virus 35S promoter. Briefly, the F5H cDNA was used as a probe toscreen a transformation competent library (Meyer et al., (1994) Science,264, 1452-1455) for genomic clones. Using this method, a cosmid clone(pBIC20-F5H) was isolated that carried a 17 kb genomic insert containingthe inferred start and stop codons of the F5H gene (FIG. 2). The portionof this cosmid carrying the F5H open reading frame was excised from thecosmid and subcloned into a vector in which it was operably linked tothe cauliflower mosaic virus 35S promoter (pGA482-35S-F5H) (FIG. 2).Both the original cosmid and this derivative plasmid construct wereelectroporated into Agrobacterium tumefaciens and were used to transformfah1-2 mutants. Success of the transformations was evidenced by TLCassays demonstrating sinapoylmalate accumulation in leaf tissues of thefah1-2 transformants carrying the T-DNA from the pBIC20-F5H cosmid orthe pGA482-35S-F5H plasmid (FIG. 3). These data clearly indicated thatthe gene encoding F5H had been identified.

Modification of Lignin Composition in Plants Transformed With F5H Underthe Control of the Cauliflower Mosaic Virus 35S Promoter

Arabidopsis plants homozygous for the fah1-2 allele were transformedwith Agrobacterium carrying the pGA482-35S-F5H plasmid which containsthe chimeric F5H gene under the control of the constitutive cauliflowermosaic virus 35S promoter (Odell, et al., Nature 313, 810-812, (1985)).Independent homozygous transformants carrying the F5H transgene at asingle genetic locus were identified by selection onkanamycin-containing growth media, grown up in soil and plant tissue wasanalyzed for lignin monomer composition. Nitrobenzene oxidation analysisof the lignin in wild type, fah1-2, and transformants carrying the T-DNAfrom the pGA482-35S-F5H construct revealed that F5H overexpression asmeasured by northern blot analysis (FIG. 4) led to a significantincrease in the transgenic lignin (FIG. 5). The lignin of theF5H-overexpressing plants demonstrated a syringyl content as high as 29mol % as opposed to the syringyl content of the wild type lignin whichwas 18 mol % (Table 1) (Example 5). These data clearly demonstrate thatoverexpression of the F5H gene is useful for the alteration of lignincomposition in transgenic plants.

                  TABLE 1                                                         ______________________________________                                        Impact of 35S Promoter-Driven F5H Expression on                                Lignin Monomer Composition in Arabidopsis                                            Line     mol % S                                                      ______________________________________                                        wild type    18.4 +/- 0.91                                                      88 5.06 +/- 0.17                                                              172 13.7 +/- 0.55                                                             170 19.2 +/- 0.56                                                             122 19.9 +/- 0.86                                                             108 22.7 +/- 0.82                                                             107 25.3 +/- 1.23                                                             180 25.8 +/- 0.78                                                             117 28.8 +/- 0.92                                                             128 27.5 +/- 1.80                                                           ______________________________________                                    

In a similar fashion, T1 tobacco (Nicotiana tabacum) F5H transformantswere generated, grown up and analyzed for lignin monomer composition.Nitrobenzene oxidation analysis demonstrated that the syringyl monomercontent of the leaf midribs was increased from 14 mol % in the wild typeto 40 mol % in the transgenic line that most highly expressed the F5Htransgene (Table 2).

                  TABLE 2                                                         ______________________________________                                        Impact of 35S Promoter-Driven F5H Expression on Lignin Monomer                  Composition in Tobacco Leaf Midrib Xylem                                            Line     mol % S                                                      ______________________________________                                        wild type    14.3 +/- 1.09                                                      40 22.4 +/- 1.53                                                              27 31.3 +/- 0.50                                                              48 35.7 +/- 6.06                                                              33 40.0 +/- 1.86                                                            ______________________________________                                    

Construction of Chimeric Genes for the Expression of F5H in Plants.

The expression of foreign genes in plants is well-established (De Blaereet al. (1987) Meth. Enzymol. 143:277-291) and this invention providesfor a method to apply this technology to the introduction of a chimericgene for the overexpression of the F5H gene in plants for themanipulation of lignin monomer composition. The expression of the F5HmRNAs at an appropriate level may require the use of different chimericgenes utilizing different promoters. A preferred class of heterologoushosts for the expression of the coding sequence of the F5H gene areeukaryotic hosts, particularly the cells of higher plants. Particularlypreferred among the higher plants and the seeds derived from them arealfalfa (Medicago sp.), rice (Oryza sp.), maize (Zea mays), oil seedrape (Brassica sp.), forage grasses, and also tree crops such aseucalyptus (Eucalyptus sp.), pine (Pinus sp.), spruce (Picea sp.) andpoplar (Populus sp.), as well as Arabidopsis sp. and tobacco (Nicotianasp.). Expression in plants will use regulatory sequences functional insuch plants.

The origin of the promoter chosen to drive the expression of the codingsequence is not critical as long as it has sufficient transcriptionalactivity to accomplish the invention by expressing translatable mRNA forthe F5H gene in the desired host tissue. Preferred promoters willeffectively target F5H expression to those tissues that undergolignification. These promoters may include, but are not limited topromoters of genes encoding enzymes of the phenylpropanoid pathway suchas the PAL promoter (Ohl et al., Plant Cell, 2, 837, (1990) and the 4CLpromoter (Hauffed et al., Plant Cell, 3, 435, (1991).

Depending upon the application, it may be desirable to select promotersthat are specific for expression in one or more organs of the plant.Examples include the light-inducible promoters of the small subunit ofribulose 1,5-bisphosphate carboxylase, if the expression is desired inphotosynthetic organs, or promoters active specifically in roots.

Expression of F5H Chimeric Genes in Plants

Various methods of introducing a DNA sequence (i.e., of transforming)into eukaryotic cells of higher plants are available to those skilled inthe art (see EPO publications 0 295 959 A2 and 0 138 341 A1). Suchmethods include those based on transformation vectors based on the Tiand Ri plasmids of Agrobacterium spp. It is particularly preferred touse the binary type of these vectors. Ti-derived vectors transform awide variety of higher plants, including monocotyledonous anddicotyledonous plants, such as soybean, cotton, tobacco, Arabidopsis andrape (Pacciotti et al., Bio/Technology 3, 241, (1985); Byrne et al.,Plant Cell, Tissue and Organ Culture 8, 3, (1987); Sukhapinda et al.,Plant Mol Biol. 8, 209, (1987); Lorz et al., Mol. Gen. Genet 199, 178,(1985); Potrykus Mol. Gen. Genet. 199, 183, (1985)).

For introduction into plants the chimeric genes of the invention can beinserted into binary vectors as described in Example 5.

Other transformation methods are available to those skilled in the art,such as direct uptake of foreign DNA constructs [see EPO publication 0295 959 A2], techniques of electroporation [see Fromm et al. (1986)Nature (London) 319:791] or high-velocity ballistic bombardment withmetal particles coated with the nucleic acid constructs (see Kline etal., Nature (London) 327:70 (1987), and see U.S. Pat. No. 4,945,050).Once transformed, the cells can be regenerated by those skilled in theart.

The following Examples are meant to illustrate key embodiments of theinvention but should not be construed to be limiting in any way.

EXAMPLES

GENERAL METHODS

Restriction enzyme digestions, phosphorylations, ligations andtransformations were done as described in Sambrook et al., MolecularCloning: A Laboratory Manual, Second Edition (1989) Cold Spring HarborLaboratory Press. All reagents and materials used for the growth andmaintenance of bacterial cells were obtained from Aldrich Chemicals(Milwaukee, Wis.), DIFCO Laboratories (Detroit, Mich.), GIBCO/BRL(Gaithersburg, Md.), or Sigma Chemical Company (St. Louis, Mo.) unlessotherwise specified.

The meaning of abbreviations is as follows: "h" means hour(s), "min"means minute(s), "sec" means second(s), "d" means day(s), "μL" meansmicroliter(s), "mL" means milliliters, "L" means liters, "g" meansgrams, "mg" means milligrams, "μg" means microgram(s), "nm" meansnanometer(s), "m" means meter(s), "E" means Einstein(s).

Plant material

Arabidopsis thaliana was grown under a 16 h light/8 h dark photoperiodat 100 mE m⁻² s⁻¹ at 24° C. cultivated in METROMIX 2000 potting mixture(Scotts, Marysville Ohio). Mutant lines fah1-1 through fah1-5 wereidentified by TLC as described below. Using their red fluorescence underUV light as a marker, mutant lines fah1-6, fah1-7, and fah1-8 wereselected from ethylmethane sulfonate (fah1-6, fah1-7) or fast-neutron(fah1-8) mutagenized populations of Landsberg erecta M2 seed. The T-DNAtagged line 3590 (fah1-9) was similarly identified in the DuPont T-DNAtagged population (Feldmann, K. A., Malmberg, R. L., & Dean, C., (1994)Mutagenesis in Arabidopizs in Arabidopsis, (E. M. Meyerowitz and C. R.Somerville, eds.) Cold Spring Harbor Press). All lines were backcrossedto wild type at least twice prior to experimental use to remove unlinkedbackground mutations.

Secondary Metabolite Analysis

Leaf extracts were prepared from 100 mg samples of fresh leaf tissuesuspended in 1 mL of 50% methanol. Samples were vortexed briefly, thenfrozen at -70 ° C. Samples were thawed, vortexed, and centrifuged at12,000 ×g for 5 min. Sinapoylmalate content was qualitatively determinedfollowing silica gel TLC, in a mobile phase of n-butanol/ethanol/water(4:1:1). Sinapic acid and its esters were visualized under long wave UVlight (365 nm) by their characteristic fluorescence.

Southern Analysis

For Southern analysis, DNA was extracted from leaf material (Rogers, etal., (1985) Plant Mol Biol. 5, 69), digested with restrictionendonucleases and transferred to HYBOND N+ membrane (Amersham, ClevelandOhio) by standard protocols. cDNA probes were radiolabelled with ³² Pand hybridized to the target membrane in Denhardt's hybridization buffer(900 mM sodium chloride, 6 mM disodium EDTA, 60 mM sodium phosphate pH7.4, 0.5% SDS, 0.01% denatured herring sperm DNA and 0.1% eachpolyvinylpyrrolidone, bovine serum albumin, and FICOLL 400) containing50% formamide at 42° C. To remove unbound probe, membranes were washedtwice at room temperature and twice at 65° C. in 2× SSPE (300 mM sodiumchloride, 2 mM disodium EDTA, 20 mM sodium phosphate, pH 7.4) containing0.1% SDS, and exposed to film.

Northern Analysis

RNA was first extracted from leaf material according to the followingprotocol.

For extraction of RNA, Covey's extraction buffer was prepared bydissolving 1% (w/v) TIPS (triisopropyl-naphthalene sulfonate, sodiumsalt), 6% (w/v) PAS (p-aminosalicylate, sodium salt) in 50 mM Tris pH8.4 containing 5% v/v Kirby's phenol. Kirby's phenol was prepared byneutralizing liquified phenol containing 0.1% (w/v) 8-hydroxyquinolinewith 0.1 M Tris-HCl pH 8.8. For each RNA preparation, a 1 g samples ofplant tissue was ground in liquid nitrogen and extracted in 5 mL Covey'sextraction buffer containing 10 μL β-mercaptoethanol. The sample wasextracted with 5 mL of a 1:1 mixture of Kirby's phenol and chloroform,vortexed, and centrifuged for 20 min at 7,000×g. The supernatant wasremoved and the nucleic acids were precipitated with 500 μL of 3 Msodium acetate and 5 mL isopropanol and collected by centrifugation at10,000×g for 10 min. The pellet was redissolved in 500 μL water, and theRNA was precipitated on ice with 250 μL 8 M LiCl, and collected bycentrifugation at 10,000×g for 10 min. The pellet was resuspended in 200μL water and extracted with an equal volume of chloroform:isoamylalcohol 1:1 with vortexing. After centrifugation for 2 min at 10,000 ×g,the upper aqueous phase was removed, and the nucleic acids wereprecipitated at -20° C. by the addition of 20 μL 3 M sodium acetate and200 μL isopropanol. The pellet was washed with 1 mL cold 70% ethanol,dried, and resuspended in 100 μL water. RNA content was assayedspectrophotometrically at 260 nm. Samples containing 1 to 10 μg of RNAwere subjected to denaturing gel electrophoresis as described elsewhere(Sambrook et al., supra).

Extracted RNA was transferred to HYBOND N⁺ membrane (Amersham, ClevelandOhio), and probed with radiolabelled probes prepared from cDNA clones.Blots were hybridized overnight, washed twice at room temperature andonce at 65° C. in 3× SSC (450 mM sodium chloride, 45 mM sodium citrate,pH 7.0) containing 0.1% SDS, and exposed to film.

Identification of cDNA and Genomic Clones

cDNA and genomic clones for F5H were identified by standard techniquesusing a 2.3 kb SacII/EcoRI fragment from the rescued plasmid (pCC1)(Example 2) as a probe. The cDNA clone pCC30 was identified in the λPRL2library (Newman et al., (1994) supra) kindly provided by Dr. ThomasNewman (DOE Plant Research Laboratory, Michigan State University, EastLansing, Mich.). A genomic cosmid library of Arabidopsis thaliana(ecotype Landsberg erecta) generated in the binary cosmid vector pBIC20(Example 3) (Meyer et al., Science 264, 1452, (1994)) was screened withthe radiolabelled cDNA insert derived from pCC30. Genomic inserts in thepBIC20 T-DNA are flanked by the neomycin phosphotransferase gene forkanamycin selection adjacent to the T-DNA right border sequence, and theβ-glucuronidase gene for histochemical selection adjacent to the leftborder. Positive clones were characterized by restriction digestion andSouthern analysis in comparison to Arabidopsis genomic DNA.

Plant transformation

Transformation of Arabidopsis thaliana was performed by vacuuminfiltration (Bent et al., Science 265, 1856, (1994)) with minormodifications. Briefly, 500 mL cultures of transformed Agrobacteriumharboring the pBIC20-F5H cosmid or the pGA482-35S-F5H construct weregrown to stationary phase in Luria broth containing 10 mg L⁻¹ rifampicinand 50 mg L⁻¹ kanamycin. Cells were harvested by centrifugation andresuspended in 1 L infiltration media containing 2.2 g MS salts(Murashige and Skoog, Physiol. Plant. 15, 473, (1962)), Gamborg's B5vitamins (Gamborg et al., Exp. Cell Res. 50, 151, (1968)), 0.5 g MES, 50g sucrose, 44 nM benzylaminopurine, and 200 μL Silwet L-77 (OSISpecialties) at pH 5.7. Bolting Arabidopsis plants (T₀ generation) thatwere 5 to 10 cm tall were inverted into the bacterial suspension andexposed to a vacuum (>500 mm of Hg) for three to five min. Infiltratedplants were returned to standard growth conditions for seed production.Transformed seedlings (T₁) were identified by selection on MS mediumcontaining 50 mg L⁻¹ kanamycin and 200 mg L⁻¹ TIMENTIN (SmithKlineBeecham) and were transferred to soil.

Transformation of tobacco was accomplished using the leaf disk method ofHorsch et al. (Science 227, 1229, (1985)).

Nitrobenzene oxidation

For the determination of lignin monomer composition, stem tissue wasground to a powder in liquid nitrogen and extracted with 20 mL of 0.1 Msodium phosphate buffer, pH 7.2 at 37° C. for 30 min followed by threeextractions with 80% ethanol at 80° C. The tissue was then extractedonce with acetone and completely dried. Tissue was saponified bytreatment with 1.0 M NaOH at 37° C. for 24 hours, washed three timeswith water, once with 80% ethanol, once with acetone, and dried.Nitrobenzene oxidation of stem tissue samples was performed with aprotocol modified from liyama et al. (J. Sci. Food Agric. 51, 481-491.(1990)). Samples of lignocellulosic material (5 mg each) were mixed with500 μL of 2 M NaOH and 25 μL of nitrobenzene. This mixture was incubatedin a sealed glass tube at 160° C. for 3 h. The reaction products werecooled to room temperature and 5 μL of a 20 mg mL⁻¹ solution of3-ethoxy-4-hydroxybenzaldehyde in pyridine was added as an internalstandard before the mixture was extracted twice with 1 mL ofdichloromethane. The aqueous phase was acidified with HCl (pH 2) andextracted twice with 900 μL of ether. The combined ether phases weredried with anhydrous sodium sulfate and the ether was evaporated in astream of nitrogen. The dried residue was resuspended in 50 μL ofpyridine, 10 μL of BSA (N,O-bis-(trimethylsilyl)-trifluoracetamide) wasadded and 1 μL aliquots of the silylated products were analyzed using aHewlet-Packard 5890 Series II gas chromatograph equipped with SupelcoSPB I column (30 m×0.75 mm). Lignin monomer composition was calculatedfrom the integrated areas of the peaks representing thetrimethylsilylated derivatives of vanillin, syringaldehyde, vanillicacid and syringic acid. Total nitrobenzene oxidation-susceptibleguaiacyl units (vanillin and vanillic acid) and syringyl units(syringaldehyde and syringic acid) were calculated following correctionfor recovery efficiencies of each of the products during the extractionprocedure relative to the internal standard.

EXAMPLE 1 IDENTIFICATION OF THE T-DNA TAGGED ALLELE OF FAH1

A putatively T-DNA tagged fah1 mutant was identified in a collection ofT-DNA tagged lines (Feldmann et al., Mol. Gen. Genet. 208, 1, (1987))(Dr. Tim Caspar, Dupont, Wilmington, Del.) by screening adult plantsunder long wave UV light. A red fluorescent line (line 3590) wasselected, and its progeny were assayed for sinapoylmalate content byTLC. The analyses indicated that line 3590 did not accumulatesinapoylmalate. Reciprocal crosses of line 3590 to a fah1-2 homozygote,followed by analysis of the F1 generation for sinapoylmalate contentdemonstrated that line 3590 was a new allele of fah1, and it wasdesignated fah1-9.

Preliminary experiments indicated co-segregation of thekanamycin-resistant phenotype of the T-DNA tagged mutant with the fah1phenotype. Selfed seed from 7 kanamycin-resistant [fah1-9× FAH1] F1plants segregated 1:3 for kanamycin resistance (kan^(sensitive)kan^(resistant)) and 3:1 for sinapoylmalate deficiency (FAH1:fah1). Fromthese lines, fah1 plants gave rise to only kan^(resistant), fah1progeny. To determine the genetic distance between the T-DNA insertionand the FAH1 locus, multiple test crosses were performed between a[fah1-9×FAH1] F1 and a fah1-2 homozygote. The distance between the FAH1locus and the T-DNA insertion was evaluated by determining the frequencyat which FAH1/kan^(sensitive) progeny were recovered in the test crossF1. In the absence of crossover events, all kanamycin-resistant F1progeny would be unable to accumulate sinapoylmalate, and would thusfluoresce red under UV light. In 682 kan^(resistant) F1 progenyexamined, no sinapoylmalate proficient plants were identified,indicating a very tight linkage between the T-DNA insertion site and theFAH1 locus.

EXAMPLE 2 PLASMID RESCUE AND cDNA CLONING OF THE fah1 GENE

Plasmid rescue was conducted using EcoRI-digested DNA prepared fromhomozygous fah1-9 plants (Behringer et al., (1992), supra). Five μg ofEcoRI-digested genomic DNA was incubated with 125 U T4 DNA ligaseovernight at 14° C. in a final volume of 1 mL. The ligation mixture wasconcentrated approximately four fold by two extractions with equalvolumes of 2-butanol, and was then ethanol precipitated andelectroporated into competent DH5-α cells as described (Newman et al.,(1994), supra).

DNA from rescued plasmids was double digested with EcoRI and SaII.Plasmids generated from internal T-DNA sequences were identified by thepresence of triplet bands at 3.8, 2.4 and 1.2 kb and were discarded. Oneplasmid (pCC1) giving rise to the expected 3.8 kb band plus a novel 5.6kb band was identified as putative external right border plasmid. Usinga SacII/EcoRI fragment of pCC1 that appeared to represent ArabidopsisDNA, putative cDNA (pCC30) clones for F5H were identified. The putativeF5H clone carried a 1.9 kb SaII-NotI insert, the sequence of which wasdetermined. BLASTX analysis (Altschul et al., J Mol. Biol. 215, 403,(1990)) indicated that this cDNA encodes a cytochrome P450-dependentmonooxygenase, consistent with earlier reports that (i) the fah1 mutantis defective in F5H (Chapple et al., supra) and (ii) F5H is a cytochromeP450-dependent monooxygenase (Grand, supra).

Southern and Northern Blot analysis

To determine whether the putative F5H cDNA actually represented the genethat was disrupted in the T-DNA tagged line Southern and northernanalysis was used to characterize the available fah1 mutants using theputative F5H cDNA.

FIG. 6 shows a Southern blot comparing hybridization of the F5H cDNA toEcoRI-digested genomic DNA isolated from wild type (ecotypes Columbia(Col), Landsberg erecta (LER), and Wassilewskija (WS)) and the nine fah1alleles including the T-DNA tagged fah1-9 allele. WS is the ecotype fromwhich the T-DNA tagged line was generated.

These data indicated the presence of a restriction fragment lengthpolymorphism between the tagged line and the wild type. These data alsoindicates a restriction fragment length polymorphism in the fah1-8allele which was generated with fast neutrons, a technique reported tocause deletion mutations.

As shown in FIG. 6 the genomic DNA of the fah1-8 and fah1-9 (the T-DNAtagged line) alleles is disrupted in the region corresponding to theputative F5H cDNA. These data also indicate that F5H is encoded by asingle gene in Arabidopsis as expected considering that the mutation inthe fah1 mutant segregates as a single Mendelian gene. These dataprovide the first indication that the putative F5H cDNA corresponds tothe gene that is disrupted in the fah1 mutants.

Plant material homozygous for nine independently-derived fah1 alleleswas surveyed for the abundance of transcript corresponding to theputative F5H cDNA using Northern blot analysis. The data is shown inFIG. 7.

As can be seen from the data, the putative F5H mRNA was represented atsimilar levels in leaf tissue of Columbia, Landsberg erecta andWassilewskija ecotypes, and in the EMS-induced fah1--1, fah1-4, andfah1-5, as well as the fast neutron-induced fah1-7. Transcript abundancewas substantially reduced in leaves from plants homozygous for thefah1-2, fah1-3 and fah1-6, all of which were EMS-induced, the fastneutron-induced mutant fah1-8 and in the tagged line fah1-9. The mRNA infah1-8 mutant also appears to be truncated. These data provided strongevidence that the cDNA clone that had been identified is encoded by theFAH1 locus.

EXAMPLE 3 DEMONSTRATION OF THE IDENTITY OF THE F5H cDNA BYTRANSFORMATION OF fah1 MUTANT PLANTS WITH WILDTYPE F5H AND RESTORATIONOF SINAPOYLMALATE ACCUMULATION

In order to demonstrate the identity of the F5H gene at the functionallevel, the transformation-competent pBIC20 cosmid library (Meyer et al.,supra) was screened for corresponding genomic clones using the fulllength F5H cDNA as a probe. A clone (pBIC20-F5H) carrying a genomicinsert of 17 kb that contains 2.2 kb of sequence upstream of theputative F5H start codon and 12.5 kb of sequence downstream of the stopcodon of the F5H gene (FIG. 2) was transformed into the fah1-2 mutant byvacuum infiltration. Thirty independent infiltration experiments wereperformed, and 167 kanamycin-resistant seedlings, representing at least3 transformants from each infiltration, were transferred to soil andwere analyzed with respect to sinapic acid-derived secondarymetabolites. Of these plants, 164 accumulated sinapoylmalate in theirleaf tissue as determined by TLC (FIG. 3). These complementation dataindicate that the gene defective in the fall mutant is present on thebinary cosmid pBIC20-FSH.

To delimit the region of DNA on the pBIC20-FSH cosmid responsible forcomplementation of the mutant phenotype, a 2.7 kB fragment of the F5Hgenomic sequence was fused downstream of the cauliflower mosaic virus35S promoter in the binary plasmid pGA482 and this construct(pGA482-35S-F5H) (FIG. 2) was transformed into the fah1-2 mutant. Thepresence of sinapoylmalate in 109 out of 110 transgenic lines analyzedby TLC or by in vivo fluorescence under UV light indicated that the fah1mutant phenotype had been complemented (FIG. 3). These data provideconclusive evidence that the F5H cDNA has been identified.

EXAMPLE 4 DNA SEQUENCING OF THE F5H cDNA AND GENOMIC CLONES

The F5H cDNA and a 5156 bp HindIII-XhoI fragment of the pBIC20-F5Hgenomic clone were both fully sequenced on both strands and the sequenceof the F5H protein (SEQ ID NO.: 2) was inferred from the cDNA sequence(FIG. 8). The sequence of the Arabidopsis thaliana F5H cDNA is given inSEQ ID NO.: 1. The sequence of the Arabidopsis thaliana F5H genomicclone is given in SEQ ID NO.: 3.

EXAMPLE 5 MODIFICATION OF LIGNIN MONOMER COMPOSITION IN TRANSGENICPLANTS OVEREXPRESSING F5H

Generation of Transgenic Plants Ectopically Expressing the F5H Gene

Using an adaptor-based cloning strategy, regulatory sequences 5' of thetranslation initiation site of the F5H gene were replaced with thestrong constitutive cauliflower mosaic virus 35S promoter (Odell et al.,Nature 313, 810-812. (1985)), as shown in FIG. 2. The resultingconstruct carries 2719 bp of the F5H genomic sequence driven by thecauliflower mosaic virus 35S promoter fused 50 bp upstream of theinferred ATG start codon. As a result, the cauliflower mosaic virus 35Spromoter drives the expression of the F5H gene by using thetranscription start site of the viral promoter and the terminationsignal present on the F5H genomic sequence. This expression cassette forectopic expression of F5H was inserted into the T-DNA of the binaryvector pGA482 (An, G. (1987), Binary Ti vectors for plant transformationand promoter analysis in: Methods in enzymology. Wu, R. ed. AcademicPress, N.Y. 153: 292-305) and introduced into Agrobacterium tumefaciensby electroporation.

Transgenic Arabidopsis plants of the ecotype Columbia that werehomozygous for the fah1-2 (Chapple et al., supra) allele weretransformed with Agrobacterium cultures harboring the pGA482-35S-F5Hconstruct according to the method of Bent et al. (supra). Transgenicplants of the T2 and T3 generation were identified by selection on mediacontaining kanamycin and subsequently transferred to soil.

Determination of lignin monomer composition of Arabidopsis stem tissue

Total stem tissue was harvested from 4 week old plants that had beengrown in soil at 22° C. under a 16 h/8 h light/dark photoperiod.Nitrobenzene oxidation analysis generated mol % syringyl values for 9different transformant lines (Table 1) ranging from 5.06±0.17 mol % to28.8±0.92 mol % as opposed to the wildtype control which demonstrated avalue of 18.4±0.91 mol %. The fah1-2 mutant background in which thetransgenic lines were generated completely lacks syringyl lignin (Table1). The low expression of the F5H transgene in a genetic background thatlacks endogenous F5H message explains how line 88 can have syringyllignin levels that are lower than wild type.

In addition to Arabidopsis, tobacco plants were transformed in a similarfashion with the F5H gene under control of the cauliflower mosaic virus35S promoter. T2 and T3 positive transformants were screened andanalyzed for lignin modification and the data is given in Table 2.Nitrobenzene oxidation analysis of tobacco leaf midribs generated mol %syringyl values for 4 different transformant lines (Table 2) rangingfrom 22.4±1.53 mol % to 40.0±1.86 mol % as opposed to the wildtypecontrol which demonstrated a value of 14.3±1.09 mol %.

The data in Tables 1 and 2 clearly demonstrate that over-expression ofthe F5H gene in transgenic plants results in the modification of ligninmonomer composition. The transformed plant is reasonably expected tohave syringyl lignin monomer content that is from about 0 mol % to about95 mol % as measured in whole plant tissue.

    __________________________________________________________________________    #             SEQUENCE LISTING                                                   - -  - - <160> NUMBER OF SEQ ID NOS: 3                                        - - <210> SEQ ID NO 1                                                        <211> LENGTH: 1838                                                            <212> TYPE: DNA                                                               <213> ORGANISM: Arabidopsis thaliana                                           - - <400> SEQUENCE: SEQ ID NO: 1                                              - - aaaaaaaaca ctcaatatgg agtcttctat atcacaaaca ctaagcaaac ta -            #tcagatcc     60                                                                 - - cacgacgtct cttgtcatcg ttgtctctct tttcatcttc atcagcttca tc -            #acacggcg    120                                                                 - - gcgaaggcct ccatatcctc ccggtccacg aggttggccc atcataggca ac -            #atgttaat    180                                                                 - - gatggaccaa ctcacccacc gtggtttagc caatttagct aaaaagtatg gc -            #ggattgtg    240                                                                 - - ccatctccgc atgggattcc tccatatgta cgctgtctca tcacccgagg tg -            #gctcgaca    300                                                                 - - agtccttcaa gtccaagaca gcgtcttctc gaaccggcct gcaactatag ct -            #ataagcta    360                                                                 - - tctgacttac gaccgagcgg acatggcttt cgctcactac ggaccgtttt gg -            #agacagat    420                                                                 - - gagaaaagtg tgtgtcatga aggtgtttag ccgtaaaaga gctgagtcat gg -            #gcttcagt    480                                                                 - - tcgtgatgaa gtggacaaaa tggtccggtc ggtctcttgt aacgttggta ag -            #cctataaa    540                                                                 - - cgtcggggag caaatttttg cactgacccg caacataact taccgggcag cg -            #tttgggtc    600                                                                 - - agcctgcgag aagggacaag acgagttcat aagaatctta caagagttct ct -            #aagctttt    660                                                                 - - tggagccttc aacgtagcgg atttcatacc atatttcggg tggatcgatc cg -            #caagggat    720                                                                 - - aaacaagcgg ctcgtgaagg cccgtaatga tctagacgga tttattgacg at -            #attatcga    780                                                                 - - tgaacatatg aagaagaagg agaatcaaaa cgctgtggat gatggggatg tt -            #gtcgatac    840                                                                 - - cgatatggtt gatgatcttc ttgcttttta cagtgaagag gccaaattag tc -            #agtgagac    900                                                                 - - agcggatctt caaaattcca tcaaacttac ccgtgacaat atcaaagcaa tc -            #atcatgga    960                                                                 - - cgttatgttt ggaggaacgg aaacggtagc gtcggcgata gagtgggcct ta -            #acggagtt   1020                                                                 - - attacggagc cccgaggatc taaaacgggt ccaacaagaa ctcgccgaag tc -            #gttggact   1080                                                                 - - tgacagacga gttgaagaat ccgacatcga gaagttgact tatctcaaat gc -            #acactcaa   1140                                                                 - - agaaacccta aggatgcacc caccgatccc tctcctcctc cacgaaaccg cg -            #gaggacac   1200                                                                 - - tagtatcgac ggtttcttca ttcccaagaa atctcgtgtg atgatcaacg cg -            #tttgccat   1260                                                                 - - aggacgcgac ccaacctctt ggactgaccc ggacacgttt agaccatcga gg -            #tttttgga   1320                                                                 - - accgggcgta ccggatttca aagggagcaa tttcgagttt ataccgttcg gg -            #tcgggtcg   1380                                                                 - - tagatcgtgc ccgggtatgc aactagggtt atacgcgctt gacttagccg tg -            #gctcatat   1440                                                                 - - attacattgc ttcacgtgga aattacctga tgggatgaaa ccaagtgagc tc -            #gacatgaa   1500                                                                 - - tgatgtgttt ggtctcacgg ctcctaaagc cacgcggctt ttcgccgtgc ca -            #accacgcg   1560                                                                 - - cctcatctgt gctctttaag tttatggttc gagtcacgtg gcagggggtt tg -            #gtatggtg   1620                                                                 - - aaaactgaaa agtttgaagt tgccctcatc gaggatttgt ggatgtcata tg -            #tatgtatg   1680                                                                 - - tgtatacacg tgtgttctga tgaaaacaga tttggctctt tgtttgccct tt -            #tttttttt   1740                                                                 - - ttctttaatg gggattttcc ttgaatgaaa tgtaacagta aaaataagat tt -            #ttttcaat   1800                                                                 - - aagtaattta gcatgttgca aaaaaaaaaa aaaaaaaa      - #                      - #   1838                                                                     - -  - - <210> SEQ ID NO 2                                                   <211> LENGTH: 520                                                             <212> TYPE: PRT                                                               <213> ORGANISM: Artificial                                                    <220> FEATURE:                                                                <223> OTHER INFORMATION: Sequence is deduced from - #DNA sequence of                 SEQ ID NO:1                                                             - - <400> SEQUENCE: SEQ ID NO: 2                                              - - Met Glu Ser Ser Ile Ser Gln Thr Leu Ser Ly - #s Leu Ser Asp Pro Thr      1               5   - #                10  - #                15               - - Thr Ser Leu Val Ile Val Val Ser Leu Phe Il - #e Phe Ile Ser Phe Ile                  20      - #            25      - #            30                   - - Thr Arg Arg Arg Arg Pro Pro Tyr Pro Pro Gl - #y Pro Arg Gly Trp Pro              35          - #        40          - #        45                       - - Ile Ile Gly Asn Met Leu Met Met Asp Gln Le - #u Thr His Arg Gly Leu          50              - #    55              - #    60                           - - Ala Asn Leu Ala Lys Lys Tyr Gly Gly Leu Cy - #s His Leu Arg Met Gly      65                  - #70                  - #75                  - #80        - - Phe Leu His Met Tyr Ala Val Ser Ser Pro Gl - #u Val Ala Arg Gln Val                      85  - #                90  - #                95               - - Leu Gln Val Gln Asp Ser Val Phe Ser Asn Ar - #g Pro Ala Thr Ile Ala                  100      - #           105      - #           110                  - - Ile Ser Tyr Leu Thr Tyr Asp Arg Ala Asp Me - #t Ala Phe Ala His Tyr              115          - #       120          - #       125                      - - Gly Pro Phe Trp Arg Gln Met Arg Lys Val Cy - #s Val Met Lys Val Phe          130              - #   135              - #   140                          - - Ser Arg Lys Arg Ala Glu Ser Trp Ala Ser Va - #l Arg Asp Glu Val Asp      145                 1 - #50                 1 - #55                 1 -      #60                                                                              - - Lys Met Val Arg Ser Val Ser Cys Asn Val Gl - #y Lys Pro Ile Asn        Val                                                                                             165  - #               170  - #               175             - - Gly Glu Gln Ile Phe Ala Leu Thr Arg Asn Il - #e Thr Tyr Arg Ala Ala                  180      - #           185      - #           190                  - - Phe Gly Ser Ala Cys Glu Lys Gly Gln Asp Gl - #u Phe Ile Arg Ile Leu              195          - #       200          - #       205                      - - Gln Glu Phe Ser Lys Leu Phe Gly Ala Phe As - #n Val Ala Asp Phe Ile          210              - #   215              - #   220                          - - Pro Tyr Phe Gly Trp Ile Asp Pro Gln Gly Il - #e Asn Lys Arg Leu Val      225                 2 - #30                 2 - #35                 2 -      #40                                                                              - - Lys Ala Arg Asn Asp Leu Asp Gly Phe Ile As - #p Asp Ile Ile Asp        Glu                                                                                             245  - #               250  - #               255             - - His Met Lys Lys Lys Glu Asn Gln Asn Ala Va - #l Asp Asp Gly Asp Val                  260      - #           265      - #           270                  - - Val Asp Thr Asp Met Val Asp Asp Leu Leu Al - #a Phe Tyr Ser Glu Glu              275          - #       280          - #       285                      - - Ala Lys Leu Val Ser Glu Thr Ala Asp Leu Gl - #n Asn Ser Ile Lys Leu          290              - #   295              - #   300                          - - Thr Arg Asp Asn Ile Lys Ala Ile Ile Met As - #p Val Met Phe Gly Gly      305                 3 - #10                 3 - #15                 3 -      #20                                                                              - - Thr Glu Thr Val Ala Ser Ala Ile Glu Trp Al - #a Leu Thr Glu Leu        Leu                                                                                             325  - #               330  - #               335             - - Arg Ser Pro Glu Asp Leu Lys Arg Val Gln Gl - #n Glu Leu Ala Glu Val                  340      - #           345      - #           350                  - - Val Gly Leu Asp Arg Arg Val Glu Glu Ser As - #p Ile Glu Lys Leu Thr              355          - #       360          - #       365                      - - Tyr Leu Lys Cys Thr Leu Lys Glu Thr Leu Ar - #g Met His Pro Pro Ile          370              - #   375              - #   380                          - - Pro Leu Leu Leu His Glu Thr Ala Glu Asp Th - #r Ser Ile Asp Gly Phe      385                 3 - #90                 3 - #95                 4 -      #00                                                                              - - Phe Ile Pro Lys Lys Ser Arg Val Met Ile As - #n Ala Phe Ala Ile        Gly                                                                                             405  - #               410  - #               415             - - Arg Asp Pro Thr Ser Trp Thr Asp Pro Asp Th - #r Phe Arg Pro Ser Arg                  420      - #           425      - #           430                  - - Phe Leu Glu Pro Gly Val Pro Asp Phe Lys Gl - #y Ser Asn Phe Glu Phe              435          - #       440          - #       445                      - - Ile Pro Phe Gly Ser Gly Arg Arg Ser Cys Pr - #o Gly Met Gln Leu Gly          450              - #   455              - #   460                          - - Leu Tyr Ala Leu Asp Leu Ala Val Ala His Il - #e Leu His Cys Phe Thr      465                 4 - #70                 4 - #75                 4 -      #80                                                                              - - Trp Lys Leu Pro Asp Gly Met Lys Pro Ser Gl - #u Leu Asp Met Asn        Asp                                                                                             485  - #               490  - #               495             - - Val Phe Gly Leu Thr Ala Pro Lys Ala Thr Ar - #g Leu Phe Ala Val Pro                  500      - #           505      - #           510                  - - Thr Thr Arg Leu Ile Cys Ala Leu                                                  515          - #       520                                             - -  - - <210> SEQ ID NO 3                                                   <211> LENGTH: 5156                                                            <212> TYPE: DNA                                                               <213> ORGANISM: Arabidopsis thaliana                                           - - <400> SEQUENCE: SEQ ID NO: 3                                              - - aagcttatgt atttccttat aaccatttta ttctgtatat agggggacag aa -             #acataata     60                                                                 - - agtaacaaat agtggtttta tttttttaaa tatacaaaaa ctgtttaacc at -            #tttatttc    120                                                                 - - ttggttagca aaattttgat atattcttaa gaaactaata ttttaggttg at -            #atattgca    180                                                                 - - gtcactaaat agttttaaaa gacacgaagt tggtaagaac aggcatatat ta -            #ttcgattt    240                                                                 - - aattaggaat gcttatgtta atctgattcg actaattaga aacgacgata ct -            #atgagctc    300                                                                 - - atagatggtc ccacgaccca ctctcccatt tgatcaatat tcaactgagc aa -            #tgaaacta    360                                                                 - - attaaaaacg tggttagatt aaaaaaataa attgtgcagg tagcggatat at -            #aatactag    420                                                                 - - taggggttaa aaataaaata aaacaccaca gtattaaatt tttgtttcaa aa -            #gtattatc    480                                                                 - - aatagttttt ttgcttcaaa aatatcacaa atttttgtat gaaatatttc tt -            #taacgaaa    540                                                                 - - ataaattaaa taaaatttaa aatttatatt tggagttcta tttttaattt ag -            #agttttta    600                                                                 - - ttgttaccac attttttgaa ttattctaat attaatttgt gatattatta ca -            #aaaagtaa    660                                                                 - - aaatatgata ttttagaata ctattatcga tatttgatat tattgacctt ag -            #ctttgttt    720                                                                 - - gggtggagac atgtgattat cttattacct ttttattcca tgaaactaca ga -            #gttcgcca    780                                                                 - - ggtaccatac atgcacacac cctcgtgaag ccgtgactta atatgatcta ga -            #acttaaat    840                                                                 - - agtactacta attgtgtcat ttgaactttc tcctatgtcg gtttcacttc at -            #gtatcgca    900                                                                 - - gaacaggtgg aatacagtgt ccttgagttt cacccaaatc ggtccaattt tg -            #tgatatat    960                                                                 - - attgcgatac agacatacag cctacagagt tttgtcttag cccactggtt gg -            #caaacgaa   1020                                                                 - - attgtcttta tttttttatg ttttgttgtc aatgtgtctt tgtttttaac ta -            #gattgagg   1080                                                                 - - tttaatttta atacatttgt tagtttacag attatgcagt gtaatctgat aa -            #tgtaagtt   1140                                                                 - - gaactgcgtt ggtcaaagtc ttgtgtaacg cactgtatct aaattgtgag ta -            #acgacaaa   1200                                                                 - - ataattaaaa ttaaaggacc ttcaagtatt attagtatct ctgtctaaga tg -            #cacaggta   1260                                                                 - - ttcagtaata gtaataaata attacttgta taattaatat ctaattagta aa -            #ccttgtgt   1320                                                                 - - ctaaacctaa atgagcataa atccaaaagc aaaaatctaa acctaactga aa -            #aagtcatt   1380                                                                 - - acgaaaaaaa gaaaaaaaaa agagaaaaaa ctacctgaaa agtcatgcac aa -            #cgttcatc   1440                                                                 - - ttggctaaat ttatttagtt tattaaatac aaaaatggcg agtttctgga gt -            #ttgttgaa   1500                                                                 - - aatatatttg tttagccact ttagaatttc ttgttttaat ttgttattaa ga -            #tatatcga   1560                                                                 - - gataatgcgt ttatatcacc aatatttttg ccaaactagt cctatacagt ca -            #tttttcaa   1620                                                                 - - cagctatgtt cactaattta aaacccactg aaagtcaatc atgattcgtc at -            #atttatat   1680                                                                 - - gctcgaattc agtaaaatcc gtttggtata ctatttattt cgtataagta tg -            #taattcca   1740                                                                 - - ctagatttcc ttaaactaaa ttatatattt acataattgt tttctttaaa ag -            #tctacaac   1800                                                                 - - agttattaag ttataggaaa ttatttcttt tatttttttt tttttttagg aa -            #attatttc   1860                                                                 - - ttttgcaaca catttgtcgt ttgcaaactt ttaaaagaaa ataaatgatt gt -            #tataattg   1920                                                                 - - attacatttc agtttatgac agattttttt tatctaacct ttaatgtttg tt -            #tccctgtt   1980                                                                 - - tttaggaaaa tcataccaaa atatatttgt gatcacagta aatcacggaa ta -            #gttatgac   2040                                                                 - - caagattttc aaagtaatac ttagaatcct attaaataaa cgaaatttta gg -            #aagaaata   2100                                                                 - - atcaagattt taggaaacga tttgagcaag gatttagaag atttgaatct tt -            #aattaaat   2160                                                                 - - attttcattc ctaaataatt aatgctagtg gcataatatt gtaaataagt tc -            #aagtacat   2220                                                                 - - gattaatttg ttaaaatggt tgaaaaatat atatatgtag attttttcaa aa -            #ggtatact   2280                                                                 - - aattattttc atattttcaa gaaaatataa gaaatggtgt gtacatatat gg -            #atgaagaa   2340                                                                 - - atttaagtag ataatacaaa aatgtcaaaa aaagggacca cacaatttga tt -            #ataaaacc   2400                                                                 - - tacctctcta atcacatccc aaaatggaga actttgcctc ctgacaacat tt -            #cagaaaat   2460                                                                 - - aatcgaatcc aaaaaaaaca ctcaatatgg agtcttctat atcacaaaca ct -            #aagcaaac   2520                                                                 - - tatcagatcc cacgacgtct cttgtcatcg ttgtctctct tttcatcttc at -            #cagcttca   2580                                                                 - - tcacacggcg gcgaaggcct ccatatcctc ccggtccacg aggttggccc at -            #cataggca   2640                                                                 - - acatgttaat gatggaccaa ctcacccacc gtggtttagc caatttagct aa -            #aaagtatg   2700                                                                 - - gcggattgtg ccatctccgc atgggattcc tccatatgta cgctgtctca tc -            #acccgagg   2760                                                                 - - tggctcgaca agtccttcaa gtccaagaca gcgtcttctc gaaccggcct gc -            #aactatag   2820                                                                 - - ctataagcta tctgacttac gaccgagcgg acatggcttt cgctcactac gg -            #accgtttt   2880                                                                 - - ggagacagat gagaaaagtg tgtgtcatga aggtgtttag ccgtaaaaga gc -            #tgagtcat   2940                                                                 - - gggcttcagt tcgtgatgaa gtggacaaaa tggtccggtc ggtctcttgt aa -            #cgttggta   3000                                                                 - - agctacttca catattcacc actcttgcta tatatatgtg caattaaaca aa -            #tatgtaaa   3060                                                                 - - aagtgaaagt actcatttct tctttcttta gtatgtactt taacatttaa cc -            #aaaacaat   3120                                                                 - - tgtaggtaag cctataaacg tcggggagca aatttttgca ctgacccgca ac -            #ataactta   3180                                                                 - - ccgggcagcg tttgggtcag cctgcgagaa gggacaagac gagttcataa ga -            #atcttaca   3240                                                                 - - agagttctct aagctttttg gagccttcaa cgtagcggat ttcataccat at -            #ttcgggtg   3300                                                                 - - gatcgatccg caagggataa acaagcggct cgtgaaggcc cgtaatgatc ta -            #gacggatt   3360                                                                 - - tattgacgat attatcgatg aacatatgaa gaagaaggag aatcaaaacg ct -            #gtggatga   3420                                                                 - - tggggatgtt gtcgataccg atatggttga tgatcttctt gctttttaca gt -            #gaagaggc   3480                                                                 - - caaattagtc agtgagacag cggatcttca aaattccatc aaacttaccc gt -            #gacaatat   3540                                                                 - - caaagcaatc atcatggtaa ttatatttca aaaagcacta gtcatagtca tg -            #tttcttaa   3600                                                                 - - tgcgttacgt aataatactt atccattgac cagttatttt ctcctaagtt tt -            #tttgtttg   3660                                                                 - - aattaggaag gtaattttct attttactag agaaagcaac agattttagc at -            #gatctttt   3720                                                                 - - tttaatatat atagaagcat tgaatattca gatctacaat aattatgaaa ct -            #aatgaaga   3780                                                                 - - gacaaaaaat ggagagagaa aaaagaaaga gtggactagt gtggatatat tt -            #aattctaa   3840                                                                 - - tttgatttta ttaggacgtt atatttaatt ctaatttgat ttttttattt ga -            #ttttatta   3900                                                                 - - ggacgttatg tttggaggaa cggaaacggt agcgtcggcg atagagtggg cc -            #ttaacgga   3960                                                                 - - gttattacgg agccccgagg atctaaaacg ggtccaacaa gaactcgccg aa -            #gtcgttgg   4020                                                                 - - acttgacaga cgagttgaag aatccgacat cgagaagttg acttatctca aa -            #tgcacact   4080                                                                 - - caaagaaacc ctaaggatgc acccaccgat ccctctcctc ctccacgaaa cc -            #gcggagga   4140                                                                 - - cactagtatc gacggtttct tcattcccaa gaaatctcgt gtgatgatca ac -            #gcgtttgc   4200                                                                 - - cataggacgc gacccaacct cttggactga cccggacacg tttagaccat cg -            #aggttttt   4260                                                                 - - ggaaccgggc gtaccggatt tcaaagggag caatttcgag tttataccgt tc -            #gggtcggg   4320                                                                 - - tcgtagatcg tgcccgggta tgcaactagg gttatacgcg cttgacttag cc -            #gtggctca   4380                                                                 - - tatattacat tgcttcacgt ggaaattacc tgatgggatg aaaccaagtg ag -            #ctcgacat   4440                                                                 - - gaatgatgtg tttggtctca cggctcctaa agccacgcgg cttttcgccg tg -            #ccaaccac   4500                                                                 - - gcgcctcatc tgtgctcttt aagtttatgg ttcgagtcac gtggcagggg gt -            #ttggtatg   4560                                                                 - - gtgaaaactg aaaagtttga agttgccctc atcgaggatt tgtggatgtc at -            #atgtatgt   4620                                                                 - - atgtgtatac acgtgtgttc tgatgaaaac agatttggct ctttgtttgc cc -            #tttttttt   4680                                                                 - - tttttcttta atggggattt tccttgaatg aaatgtaaca gtaaaaataa ga -            #tttttttc   4740                                                                 - - aataagtaat ttagcatgtt gcaaagatcg atcttggatg agaacttcta ct -            #taaaaaaa   4800                                                                 - - aaaaaaaaat ttttttttag ttatttcacc tttttctttt gttctggttg ta -            #tggttgcc   4860                                                                 - - attgtgtcaa ttaggggctg gaagttcgct ggttaaggct aaatcagagt ta -            #aagttata   4920                                                                 - - attttacaag cccaacaaaa ggtcgcagat taaaaccaca tgatatttat aa -            #aaaaaatt   4980                                                                 - - ctaaggtttt tattagtttt attttcagtt tactgagtac tatttacttt tt -            #tatttttt   5040                                                                 - - gcaaataaat gtattttatc atatttatgt tttttgttat aaactccaaa ca -            #tacaggtt   5100                                                                 - - tcattaccta aaaaaagaca gagtggtttc gttaattttg tttcattaat ct - #cgag           5156                                                                     __________________________________________________________________________

What is claimed is:
 1. An isolated nucleic-acid fragment encoding anactive plant F5H enzyme having an amino acid sequence selected from thegroup consisting of (1) the amino acid sequence of SEQ ID NO: 2, and (2)an amino acid sequence of SEQ ID NO: 2 encompassing amino acidsubstitutions, additions and deletions that do not alter the function ofthe active plant F5H enzyme.
 2. A chimeric gene causing alteredguaiacyl:syringyl lignin monomer ratios in a plant cell transformed withthe chimeric gene, the chimeric gene comprising:a regulatory sequence;and a nucleic-acid fragment encoding an active plant F5H enzyme havingan amino acid sequence selected from the group consisting of (1) theamino acid sequence of SEQ ID NO: 2, and (2) an amino acid sequence ofSEQ ID NO: 2 encompassing amino acid substitutions, additions anddeletions that do not alter the function of the active plant F5H enzyme;and wherein the nucleic-acid fragment is operably linked in either senseor antisense orientation to the regulatory sequence.
 3. The chimericgene of claim 2 wherein the nucleic acid fragment is operably linked inthe sense orientation to at least one regulatory sequence.
 4. Thechimeric gene of claim 2 wherein the regulatory sequence comprises apromoter selected from the group consisting of cauliflower mosaic virus35S promoter, the promoter for the phenylalanine ammonia lyase gene andthe promoter for the p-coumaroyl CoA ligase gene.
 5. The chimeric geneof claim 2, wherein the regulatory sequence comprises an endogenousplant promoter effective for controlling expression of a plant F5H gene.6. An isolated nucleic-acid fragment selected from the group consistingof the nucleic acid fragment of SEQ ID NO: 1, the nucleic acid fragmentof SEQ ID NO: 3 and a nucleic acid fragment of SEQ ID NO: 1 or SEQ IDNO: 3 encompassing base changes that do not alter the function of theencoded plant F5H enzyme.
 7. A chimeric gene causing alteredguaiacyl:syringyl lignin monomer ratios in a plant cell transformed withthe chimeric gene, the chimeric gene comprising an isolated nucleic-acidfragment selected from the group consisting of the nucleic acid fragmentof SEQ ID NO: 1, the nucleic acid fragment of SEQ ID NO: 3 and a nucleicacid fragment of SEQ ID NO: 1 or SEQ ID NO: 3 encompassing base changesthat do not alter the function of the encoded plant F5H enzyme, operablylinked in either sense or antisense orientation to at least oneregulatory sequence.
 8. The chimeric gene of claim 7 wherein the nucleicacid fragment is operably linked in the sense orientation to at leastone regulatory sequence.
 9. The chimeric gene of claim 7, wherein the atleast one regulatory sequence comprises a promoter selected from thegroup consisting of cauliflower mosaic virus 35S promoter, the promoterfor the phenylalanine ammonia lyase gene and the promoter for thep-coumaroyl CoA ligase gene.
 10. The chimeric gene of claim 7 whereinthe at least one regulatory sequence comprises an endogenous plantpromoter effective for controlling expression of a plant F5H gene.
 11. Atransformed plant having altered guaiacyl:syringyl lignin monomer ratiosrelative to the ratios of an untransformed plant, comprising a hostplant having incorporated therein an expressible chimeric gene causingaltered guaiacyl:syringyl lignin monomer ratios in a plant celltransformed with the chimeric gene, the chimeric gene comprising anisolated nucleic-acid fragment selected from the group consisting of thenucleic acid fragment of SEQ ID NO: 1, the nucleic acid fragment of SEQID NO: 3 and a nucleic acid fragment of SEQ ID NO: 1 or SEQ ID NO: 3encompassing base changes that do not alter the function of the encodedplant F5H enzyme, operably linked in either sense or antisenseorientation to at least one regulatory sequence.
 12. The transformedplant of claim 11 wherein the syringyl lignin monomer content is fromabout 0 mol % to about 95 mol % as measured in whole plant tissue. 13.The transformed plant of claim 12 wherein the host plant is selectedfrom the group consisting of alfalfa (Medicago sp.), rice (Oryza sp.),maize (Zea mays), oil seed rape (Brassica sp.), forage grasses,(Arabidopsis sp.), tobacco (Nicotiana sp.), eucalyptus (Eucalyptus sp.),pine (Pinus sp.), spruce (Picea sp.) and poplar (Populus sp.).
 14. Thetransformed plant of claim 12, wherein the host plant is a tree crop.15. A method of altering the activity of F5H enzyme in a plant,comprising:(i) transforming a cell, tissue or organ from a host plantwith a chimeric gene causing altered guaiacyl:syringyl lignin monomerratios in a plant cell transformed with the chimeric gene, the chimericgene comprising an isolated nucleic-acid fragment selected from thegroup consisting of the nucleic acid fragment of SEQ ID NO: 1, thenucleic acid fragment of SEQ ID NO: 3 and a nucleic acid fragment of SEQID NO: 1 or SEQ ID NO: 3 encompassing base changes that do not alter thefunction of the encoded plant F5H enzyme, operably linked in eithersense or antisense orientation to at least one regulatory sequencewherein the chimeric gene is expressed; (ii) selecting a transformedcell, cell callus, somatic embryo, or seed which contains the chimericgene; (iii) regenerating a whole plant from the selected transformedcell, cell callus, somatic embryo, or seed; and (iv) selecting aregenerated whole plant which has a phenotype selected from the groupconsisting of (1) accumulation of compounds derived from sinapic acidand (2) an altered syringyl lignin monomer content relative to anuntransformed host plant; wherein F5H enzyme activity is altered in theplant.
 16. A method of altering the content or composition of lignin ina plant, comprising stably incorporating into the genome of the plant achimeric gene causing altered guaiacyl:syringyl lignin monomer ratios ina plant cell transformed with the chimeric gene, the chimeric genecomprising an isolated nucleic-acid fragment selected from the groupconsisting of the nucleic acid fragment of SEQ ID NO: 1, the nucleicacid fragment of SEQ ID NO: 3 and a nucleic acid fragment of SEQ ID NO:1 or SEQ ID NO: 3 encompassing base changes that do not alter thefunction of the encoded plant F5H enzyme, operably linked in eithersense or antisense orientation to at least one regulatorysequence;wherein said incorporating is achieved by transformation meanswhereby the incorporated chimeric gene expresses F5H enzyme and wherebyguaiacyl:syringyl lignin monomer content or composition is altered fromthat of the untransformed host plant.
 17. A transformed plant havingaltered guaiacyl:syringyl lignin monomer ratios relative to the ratiosof an untransformed plant, comprising a host plant having incorporatedtherein an expressible chimeric gene causing altered guaiacyl:syringyllignin monomer ratios in a plant cell transformed with the chimericgene;wherein the chimeric gene comprises:a regulatory sequence; and anucleic-acid fragment encoding an active plant F5H enzyme having anamino acid sequence selected from the group consisting of (1) the aminoacid sequence of SEQ ID NO: 2, and (2) an amino acid sequence of SEQ IDNO: 2 encompassing amino acid substitutions, additions and deletionsthat do not alter the function of the active plant F5H enzyme; andwherein the nucleic-acid fragment is operably linked in either sense orantisense orientation to the regulatory sequence.
 18. The transformedplant of claim 17 wherein the syringyl lignin monomer content is fromabout 0 mol % to about 95 mol % as measured in whole plant tissue. 19.The transformed plant of claim 17 wherein the host plant is selectedfrom the group consisting of alfalfa (Medicago sp.), rice (Oryza sp.),maize (Zea mays), oil seed rape (Brassica sp.), forage grasses,(Arabidopsis sp.), tobacco (Nicotiana sp.), eucalyptus (Eucalyptus sp.),pine (Pinus sp.), spruce (Picea sp.) and poplar (Populus sp.).
 20. Thetransformed plant of claim 17 wherein the host plant is a tree crop. 21.A method of altering the activity of F5H enzyme in a plant,comprising:(i) transforming a cell, tissue or organ from a host plantwith a chimeric gene causing altered guaiacyl:syringyl lignin monomerratios in a plant cell transformed with the chimeric gene, (ii)selecting a transformed cell, cell callus, somatic embryo, or seed whichcontains the chimeric gene; (iii) regenerating a whole plant from theselected transformed cell, cell callus, somatic embryo, or seed; and(iv) selecting a regenerated whole plant which has a phenotype selectedfrom the group consisting of (1) accumulation of compounds derived fromsinapic acid or (2) an altered syringyl lignin monomer content relativeto an untransformed host plant; wherein the chimeric gene comprises aregulatory sequence and a nucleic-acid fragment encoding an active plantF5H enzyme having an amino acid sequence selected from the groupconsisting of (1) the amino acid sequence of SEQ ID NO: 2, and (2) anamino acid sequence of SEQ ID NO: 2 encompassing amino acidsubstitutions, additions and deletions that do not alter the function ofthe active plant F5H enzyme; wherein the nucleic-acid fragment isoperably linked in either sense or antisense orientation to theregulatory sequence; and wherein F5H enzyme activity is altered in theplant.
 22. A method of altering the content or composition of lignin ina plant, comprising stably incorporating into the genome of the plant achimeric gene causing altered guaiacyl:syringyl lignin monomer ratios ina plant cell transformed with the chimeric gene;wherein the chimericgene comprises a regulatory sequence and a nucleic-acid fragmentencoding an active plant F5H enzyme having an amino acid sequenceselected from the group consisting of (1) the amino acid sequence of SEQID NO: 2, and (2) an amino acid sequence of SEQ ID NO: 2 encompassingamino acid substitutions, additions and deletions that do not alter thefunction of the active plant F5H enzyme; wherein the nucleic-acidfragment is operably linked in either sense or antisense orientation tothe regulatory sequence; and wherein said incorporating is achieved bytransformation means whereby the incorporated chimeric gene expressesF5H enzyme and whereby guaiacyl:syringyl lignin monomer content orcomposition is altered from that of the untransformed host plant.
 23. Amethod for altering the activity of F5H enzyme in a plant,comprising:providing an expressable chimeric gene comprising anucleotide sequence encoding an active plant F5H enzyme having an aminoacid sequence selected from the group consisting of (1) the amino acidsequence of SEQ ID NO: 2, and (2) an amino acid sequence of SEQ ID NO: 2encompassing amino acid substitutions, additions and deletions that donot alter the function of the enzyme; and transforming a plant with thechimeric gene to provide a transformed plant, wherein the transformedplant expresses the chimeric gene, and wherein F5H enzyme activity isaltered in the plant.
 24. The method according to claim 23, wherein thenucleotide sequence is selected from the group consisting of thesequence set forth in SEQ ID NO: 1 and the sequence set forth in SEQ IDNO: 3.